Wuhan University, in collaboration with China Mobile's Jiutian AI team and Duke Kunshan University, has released the open-source audio-video speaker recognition dataset VoxBlink2, which is based on YouTube data and contains over 110,000 hours of audio-video recordings. The dataset includes 9,904,382 high-quality audio clips and their corresponding video segments, sourced from 111,284 users on YouTube, making it the largest publicly available audio-video speaker recognition dataset to date. The release of this dataset aims to enrich open-source speech corpora and support the training of voiceprint large models.